Ping-pong Document Clustering using NMF and Linkage-Based Refinement

نویسندگان

  • Hiroyuki Shinnou
  • Minoru Sasaki
چکیده

This paper proposes a ping-pong document clustering method using NMF and the linkage based refinement alternately, in order to improve the clustering result of NMF. The use of NMF in the ping-pong strategy can be expected effective for document clustering. However, NMF in the ping-pong strategy often worsens performance because NMF often fails to improve the clustering result given as the initial values. Our method handles this problem with the stop condition of the ping-pong process. In the experiment, we compared our method with the k-means and NMF by using 16 document data sets. Our method improved the clustering result of NMF significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Refinement of Document Clustering by Using NMF

In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cu...

متن کامل

Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces nonnegativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering int...

متن کامل

Cluster-based language model for spoken document retrieval using NMF-based document clustering

In this paper, a non-negative matrix factorization (NMF)based document clustering approach is proposed for the cluster-based language model for spoken document retrieval. The retrieval language model comprises three different unigram models: a whole corpus collect-based unigram, documentbased unigram, and a document clustering-based unigram. They are combined with double linear interpolations. ...

متن کامل

Citronellyl Butyrate Synthesis in Non-Conventional Media Using Packed-Bed Immobilized Candida Rugosa Lipase Reactor

The synthesis of citronellyl butyrate by direct esterification reaction catalyzed by immobilized lipase from Candida rugosa was studied in a continuous packed bed reactor using n-hexane as organic solvent. Parameters such as residence time, temperature, and pH were examined. The optimum conversion was obtained at a flow rate of 1 ml/min (residence time 8 min), temperature of 50 °C, and pH 7.5. ...

متن کامل

Efficient Document Clustering via Online Nonnegative Matrix Factorizations

In recent years, Nonnegative Matrix Factorization (NMF) has received considerable interest from the data mining and information retrieval fields. NMF has been successfully applied in document clustering, image representation, and other domains. This study proposes an online NMF (ONMF) algorithm to efficiently handle very large-scale and/or streaming datasets. Unlike conventional NMF solutions w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008